# English Speech Processing

Wav2vec2 Base Librispeech Demo Colab
Apache-2.0
This model is a speech recognition model fine-tuned on the LibriSpeech dataset based on facebook/wav2vec2-base, achieving a word error rate of 0.3174 on the evaluation set.
Speech Recognition Transformers
W
vishwasgautam
14
0
Distil Large V3.5 ONNX
MIT
Distil-Whisper is a knowledge-distilled version of OpenAI Whisper-Large-v3, offering superior performance and efficiency.
Speech Recognition Transformers English
D
distil-whisper
25
1
Ichigo Llama3.1 S Instruct V0.3 Phase 3
Apache-2.0
Ichigo-llama3s is a large language model series that supports both audio and text input, focusing on enhancing speech understanding capabilities and user interaction experience.
Text-to-Audio English
I
homebrewltd
43
35
Whisper Ner V1
MIT
WhisperNER is a novel model capable of simultaneous speech transcription and entity recognition, supporting open-type named entity recognition (NER).
Speech Recognition Supports Multiple Languages
W
aiola
174
23
Wav2vec2 Large Lv60 Phoneme Timit English Timit 4k 002
Apache-2.0
A fine-tuned English phoneme recognition model based on facebook/wav2vec2-large-lv60 on the TIMIT dataset, achieving a phoneme error rate of 10.53%
Speech Recognition Transformers English
W
excalibur12
103
1
Gazelle V0.2
Apache-2.0
Gazelle v0.2 is a joint speech-language model released by Tincans, supporting English.
Text-to-Audio Transformers English
G
tincans-ai
90
99
Wav2vec2 Large Xlsr 53 English Finetuned Ravdess
Apache-2.0
A speech emotion recognition model fine-tuned on the RAVDESS dataset based on the wav2vec2-large-xlsr-53-english model
Audio Classification Transformers
W
firdho26
68
0
Wav2vec2 Lg Xlsr En Speech Emotion Recognition Finetuned Ravdess V8
Apache-2.0
English speech emotion recognition model based on wav2vec2 architecture, fine-tuned on the RAVDESS dataset
Audio Classification Transformers
W
Wiam
94
4
Wav2vec2 Base Speech Emotion Recognition
Apache-2.0
A speech emotion recognition model fine-tuned based on facebook/wav2vec2-base, used to predict the speaker's emotions in audio samples.
Audio Classification Transformers English
W
DunnBC22
128
13
Wav2vec2 Large 960h Intent Classification Ori
Apache-2.0
Fine-tuned intent classification model based on facebook/wav2vec2-large-960h, achieving 77.08% accuracy on the evaluation set
Audio Classification Transformers
W
MuhammadIqbalBazmi
15
0
Wav2vec2 Large Tedlium
Apache-2.0
Wav2Vec2 large speech recognition model fine-tuned on the TEDLIUM corpus, supporting English speech-to-text conversion
Speech Recognition English
W
sanchit-gandhi
58
1
Wav2vec2 Base Timit Demo Colab
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on the facebook/wav2vec2-base model, featuring a low Word Error Rate (WER).
Speech Recognition Transformers
W
nawta
96
1
Wav2vec2 Base Timit Demo Google Colab
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, specializing in English speech-to-text tasks
Speech Recognition Transformers
W
dasolj
127
0
Wav2vec Cv
Apache-2.0
A speech recognition model fine-tuned based on facebook/wav2vec2-base-960h
Speech Recognition Transformers
W
eugenetanjc
69
0
Wav2vec Mle
Apache-2.0
A speech recognition model fine-tuned based on facebook/wav2vec2-base-960h, achieving a word error rate of 1.0 on the evaluation set
Speech Recognition Transformers
W
eugenetanjc
68
0
Wav2vec2 Base Dataset Asr Demo Colab
Apache-2.0
This is a speech recognition model fine-tuned on the superb dataset based on distilhubert, primarily used for Automatic Speech Recognition (ASR) tasks.
Speech Recognition Transformers
W
aminnaghavi
34
0
Wav2vec2 Base Timit Demo Google Colab
Apache-2.0
This model is a speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, achieving a word error rate (WER) of 0.3384 on the evaluation set.
Speech Recognition Transformers
W
mikeluck
38
0
Assignment1 Francesco
MIT
An automatic speech recognition (ASR) model trained based on Speech-to-Text Transformer (S2T), specifically designed for English speech recognition
Speech Recognition Transformers English
A
Classroom-workshop
22
0
Wav2vec2 19
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base, supporting automatic speech-to-text tasks
Speech Recognition Transformers
W
chrisvinsen
18
0
Xlsr English
Apache-2.0
An English speech recognition model fine-tuned on the librispeech_asr dataset based on facebook/wav2vec2-xls-r-300m
Speech Recognition Transformers
X
ashesicsis1
18
0
Wav2vec2 Base Timit Demo Google Colab
Apache-2.0
This model is a speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, focusing on English speech-to-text tasks.
Speech Recognition Transformers
W
wrice
17
0
Wav2vec2 Base Timit Google Colab
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, achieving a word error rate (WER) of 0.3355 on the evaluation set.
Speech Recognition Transformers
W
anithapappu
19
0
Wav2vec2 7
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, achieving a word error rate of 0.52 on the evaluation set.
Speech Recognition Transformers
W
chrisvinsen
20
0
D L Dl
This model is a speech recognition model fine-tuned based on facebook/wav2vec2-base-960h, achieving a word error rate (WER) of 1.0 on the evaluation set.
Speech Recognition Transformers
D
bkh6722
25
0
Wav2vec2 Base Timit Demo Google Colab
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, suitable for English speech-to-text tasks
Speech Recognition Transformers
W
BitanBiswas
28
0
Wav2vec2 Base Timit Demo Google Colab
Apache-2.0
This model is a speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, primarily used for English speech-to-text tasks.
Speech Recognition Transformers
W
patrickvonplaten
26
2
Wav2vec2 Base Timit Demo Colab92
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on the facebook/wav2vec2-base model
Speech Recognition Transformers
W
hassnain
16
0
Wav2vec2 Base Timit Demo Colab90
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, specializing in English speech-to-text tasks
Speech Recognition Transformers
W
hassnain
16
0
Wav2vec2 Base Timit Demo Colab11
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, achieving a word error rate of 0.4348 on the TIMIT dataset.
Speech Recognition Transformers
W
sameearif88
18
0
Wav2vec2 Base Timit Demo Colab 1
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, trained on the TIMIT dataset with a word error rate (WER) of 0.4398.
Speech Recognition Transformers
W
zasheza
18
0
Wav2vec2 Base Timit Demo Colab2
Apache-2.0
This model is a speech recognition model fine-tuned from facebook/wav2vec2-base, achieving a word error rate (WER) of 0.5664 on the evaluation set.
Speech Recognition Transformers
W
sameearif88
16
0
Wav2vec2 Base Timit Ali Hasan Colab EX2
Apache-2.0
A speech recognition model fine-tuned from facebook/wav2vec2-base on the TIMIT dataset, with a WER of 0.4458 on the evaluation set
Speech Recognition Transformers
W
ali221000262
23
0
Wav2vec2 Base Timit Ali Hasan Colab
Apache-2.0
A speech recognition model fine-tuned from facebook/wav2vec2-base, trained on the TIMIT dataset
Speech Recognition Transformers
W
ali221000262
25
0
Wav2vec2 Base Timit Moaiz Exp2
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base
Speech Recognition Transformers
W
moaiz237
24
0
Wav2vec2 Base Timit Demo Colab
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on the wav2vec2-base model
Speech Recognition Transformers
W
ali221000262
23
0
Wav2vec2 Base Timit Demo Colab
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base, trained and evaluated on the TIMIT dataset.
Speech Recognition Transformers
W
shumail
24
0
Wav2vec2 Base Timit Demo Colab
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-base, for demonstration purposes
Speech Recognition Transformers
W
moaiz237
24
0
Wav2vec2 Base 960h Timit Demo Colab
Apache-2.0
A speech recognition model fine-tuned based on facebook/wav2vec2-base-960h, achieving a 21.6% word error rate on the TIMIT dataset
Speech Recognition Transformers
W
obokkkk
20
1
Wav2vec2 Base Timit Demo Colab
Apache-2.0
A speech recognition model fine-tuned based on facebook/wav2vec2-base, trained on the TIMIT dataset with a Word Error Rate (WER) of 0.3468.
Speech Recognition Transformers
W
obokkkk
20
0
Wav2vec2 Child En Tokenizer 4
Apache-2.0
This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m, specifically designed for English child speech recognition tasks.
Speech Recognition Transformers
W
jaeyeon
16
1
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase